Table of Contents

Data and Visualization
Data Sets
Codebooks
Events
Works-in-Progress
Discussion - Mailing list
Wishlist
Visualizations
Prior art
Distributed data collection
Shared data
Coordinated Twitter searching
Prior art
Twitter Firehose
Shared DB
Keywords for Twitter analysis
Keyword discovery
Sample list

Data and Visualization

Increasingly, as we participate in social movement activity we leave data traces across the web: tweets, facebook updates and likes, IRC conversations, and other activities across the net produce information that can later be gathered, analyzed, mined, and visualized. Web companies do this constantly; for most, gathering, analyzing, packaging, and selling user data is a main source of revenue. Intelligence agencies are also investing increasing resources in automated extraction of information from the social web. These developments have serious implications for privacy. At the same time, the tools to gather, analyze, and visualize large datasets are increasingly available to more people than ever before, including to researchers, small organizations, and everyday individuals. This page is for sharing datasets, as well as for sharing information about how Occupy Researchers might collaborate to gather, share, analyze, and visualize data about the movement.

Initial motivating questions include:

Data Sets


Codebooks

Events

Maybe we can get a chat soon to talk about all this. Write here if you are interested:
external image VZ9zlNbF3V9hb9_XSMnf8lVsc6HaOdpL5EVA7Uud-WI818wRld1vOi2vknArAYz0aZLEsibMW5Q3Kci7y--vkHT9nIYbCn4yshE3DI72TOKByorgCdg

Works-in-Progress

Lab notebook style record of work that's underway but not yet ready for prime time.

Discussion - Mailing list

Stay updated of the discussion of this group and sign up at the mailing list
http://groups.google.com/group/occupyresearch-data
I've just found another #occupydata list:
http://groups.google.com/group/occupydata

Wishlist


The link below points to a wishlist for researchers to express analyses, transformations, and reports they'd like to be able to make with Twitter data. It's not strictly for Occupy-related research but seems like a useful tool for us, too. Contributing to this page is as simple as writing a sentence starting with "I wish..."


Visualizations


Prior art


Distributed data collection


Shared data

Occupydata.org (coming soon)
Data on the DataHub: thedatahub.org/group/occupyMapBox: tiles.mapbox.com/occupyGithub: https://github.com/occupydc

Coordinated Twitter searching


DIY data mining on Twitter:

Some quick+dirty experiments this week with the regular old Search API and the Streaming API demonstrate the fundamental challenge:

Next steps:

Prior art

Chirper (last update June 2011):

Additional:

Kevin Driscoll - currently have access to the “firehose” ($$$) and collecting tweets on a few 100 keywords. Have a complete data set since ~Oct 12. Working on an exemption to licensing so that we can share this data with others. Especially important to merge data bc it is very difficult to acquire a complete corpus of relevant tweets. Analysis is very rudimentary at this point: frequency, keyword discovery, collecting links. Duration of access to firehose is uncertain because of cost. Very interested in a distributed search method that will coordinate multiple clients and use firehose as a benchmark to test this. Also, simple search client (non firehose) here: https://github.com/driscoll/quickndirty

Twitter Spam on #OccupyBoston

From Takis:
"Since my research focuses on information reliability and trust in the real-time web, I put up a script connecting to the Twitter Streaming API and collected data for about 22 hours, containing one of the phrases: #occupyboston<https://plus.google.com/u/0/s/٪23occupyboston>, Occupy Boston, Boston PD, #bpd<https://plus.google.com/u/0/s/٪23bpd>. In total I received 17094 tweets by 8367 unique users. Searching their description field for the phrase ±over 18½, I discovered 570 accounts who had sent 681 tweets. That is, 6.8٪ of users tweeting about the #occupyboston<https://plus.google.com/u/0/s/٪23occupyboston>movement were spam bots of pornographic accounts. This might look like a small number, but keep in mind that I only searched for one kind of spam accounts (containing the words ±over 18½). The point I want to make though is why do these accounts appear in the real-time stream at all. Twitter claims to use quality criteria in deciding what content is displayed in the public search stream. Donφt the URLs of these accounts raise a red-flag in Twitterφs quality algorithms since they are pointing to pornographic websites? Yes, I know that pornography is legal, but I was not searching for it, so why should I get to see pornographic accounts, especially some that are against Twitter rules (pornographic images in profile picture, copying content from other users, etc. https://support.twitter.com/articles/18311-the-twitter-rules) So, Twitter, please use some better algorithms in filtering content."

Twitter Firehose


Does anyone have ready access to the Twitter "firehose" that they can share?

Shared DB



Keywords for Twitter analysis


Keyword discovery


TODO: a simple script that identifies and suggests new keywords based on search results from the existing list

Sample list


note: perhaps run queries on occupy* where * is wildcard. Rather than try to keep up with adding each city.
note: the list below was compiled for the somewhat idiosyncratic requirements of the gnip powertrack system.

Last update: October 30, 9:06PM PT

#nypd
#occupy
#ocs
#ows
#sdpd
99xmas
americanautumn
fraudclosure
freedomco
globalrevlive
globalrevolution
holidays99pct
holidayhomemade
iamthe53
iamthe99
moveyourmoney
occupy_boston
occupy_okc
occupyalabama
occupyalbany
occupyalbuquerque
occupyallentown
occupyallstreet
occupyannarbor
occupyarkansas
occupyashland
occupyashville
occupyathens
occupyatlanta
occupyatlantaga
occupyaustin
occupybaltimore
occupybaystreet
occupybeantown
occupyberkeley
occupybgm
occupybham
occupybinghamton
occupybloomington
occupyboise
occupybos_media
occupyboston
occupybuf
occupybuffalo
occupyburlington
occupycanada
occupycarbondale
occupycharlotte
occupychattanooga
occupychi
occupychicago
occupychico
occupychristmas
occupycincinnati
occupycincy
occupycleveland
occupycolleges
occupycolorado
occupycoloradosprings
occupycolumbus
occupycouch
occupycsu
occupydallas
occupydayton
occupydc
occupydcneeds
occupydenver
occupydesmoines
occupydetroit
occupyearth
occupyeducation
occupyelkhart
occupyelpaso
occupyeugene
occupyeureka
occupyeverywhere
occupyfindlay
occupyflagstaff
occupyflorida
occupyfortcollins
occupyfortwayne
occupyfresno
occupygrandrapids
occupyhonolulu
occupyhouston
occupyidahofalls
occupyindianapolis
occupyindy
occupyinlandempire
occupyinternet
occupyiowacity
occupyithaca
occupyjacksonville
occupyjax
occupykansascity
occupyketchum
occupyknoxville
occupykst
occupykstreet
occupyla
occupylansing
occupylasvegas
occupylawrence
occupylexington
occupylondon
occupylosangeles
occupylouisville
occupylsx
occupymadison
occupymarines
occupymaui
occupymcallen
occupymemorial
occupymemorialdr
occupymemphis
occupymia
occupymiami
occupyminneapolis
occupymissoula
occupymn
occupymoscow
occupymuseums
occupynashville
occupynation
occupynb
occupyneworleans
occupynewyorkcity
occupynj
occupynola
occupynorfolk
occupynorthampton
occupyns
occupyny
occupynyc
occupyoakland
occupyocala
occupyoklahomacity
occupyorlandofl
occupyoxnard
occupyphiladelphia
occupyphilly
occupyphoenix
occupyphx
occupyportland
occupyprov
occupyprovidence
occupyraleigh
occupyredlands
occupyresearch
occupyriverside
occupyrochester
occupyrockford
occupysac
occupysacramento
occupysalem
occupysaltlakecity
occupysanantonio
occupysandiego
occupysanfran
occupysanfrancisco
occupysanjose
occupysanluisobispo
occupysantabarbara
occupysantacruz
occupysantafe
occupysarasota
occupysavannah
occupysd
occupyseaside
occupyseattle
occupysf
occupyslc
occupyslo
occupysolidarity
occupysouthbend
occupysouthgate
occupyspokane
occupyspringfield
occupystandrews
occupystl
occupystlouis
occupysydney
occupysyracuse
occupytampa
occupythehood
occupythemedia
occupythenation
occupytogether
occupytoronto
occupytrenton
occupytucson
occupyus
occupyusa
occupyvancouver
occupyventura
occupyvermont
occupyvictoria
occupywallst
occupywallstnyc
occupywallstreet
occupywashingtondc
occupywichita
occupywinnipeg
occupywinstonsalem
occupyworcester
occupywriters
occupyx
occupyxmas
oct15
october15
owslosangeles
reclaimuc
theother99
wearethe53
wearethe99
weoccupyamerica
zuccotti
#moveyourmoney
#moveourmoney
"Credit Union"
"Credit Unions"
#banktransferday
#breakthebanks
creditunion
creditunions
"big banks"
BofA
"Bank of America"
"vampire squid"
"Too Big to Fail"
"TBTF"
"local bank"
"community bank"
"debit fees"
"debit card"
"wells fargo"
"good-guy banks"
"zombie banks"
#opcashback
#operationreturnmail